A model for the accidental catalysis of protein unfolding in vivo 



Richard P. Sear^Q 

1 Department of Physics, University of Surrey, Guildford, Surrey GU2 7XH, United Kingdom 

Activated processes such as protein unfolding are highly sensitive to heterogeneity in the environ- 
ment. We study a highly simplified model of a protein in a random heterogeneous environment, a 
model of the in vivo environment. It is found that if the heterogeneity is sufficiently large the total 
rate of the process is essentially a random variable; this may be the cause of the species-to-species 
variability in the rate of prion protein conversion found by Deleault et al. [Nature, 425 (2003) 717]. 



Protein unfolding is implicated in a number of diseases 
including prion diseases such as Creutzfeldt- Jakob dis- 
ease 0, y, El- It is an activated process, a free energy 
barrier must be overcome for a protein to unfold from 
its native state. At the top of the barrier the protein 
is in the transition state for unfolding, and the transi- 
tion state's free energy determines the rate 0,0. As the 
rate depends exponentially on the free energy, the rate 
is very sensitive to interactions of other molecules from 
the environment with the transition state. Inside living 
cells there is a mixture of thousands of different proteins, 
RNA, etc., if any of them can interact with the transition 
state of unfolding such that its free energy is only a few 
ksT lower, then the rate of prion protein conversion when 
interacting with this other molecule will be increased by 
an order of magnitude. Supattapone and coworkers |6( 
studied prion conversion in cell extracts and found that 
the rate of prion protein conversion was greatly accel- 
erated by an RNA molecule or molecules, and that sur- 
prisingly this acceleration was specific to the RNA of only 
some species. Here we look at a very simple model of un- 
folding in vivo, and examine how the rate of unfolding 
is affected by the protein being in a complex mixture of 
many other molecules. Characterising the interactions 
of thousands of different molecules with the transition 
state is a hop eless task and so we resort to a statistical 
approach |lCllll| . We take the interactions to be random 
variables. This reduces the problem from characterising 
a huge number of interactions to just characterising the 
distribution function of these random variables. By tak- 
ing all the interactions to be random variables we are 
ignoring the fact that natural selection may be acting to 
restrict or increase the strength of some of the interac- 
tions, and so our model will be a poor one if the RNA ac- 
celerating the rate of prion protein conversion has evolved 
to interact strongly with the prion protein. Very little is 
definitely known about the function of the prion protein 
0, Q and so we cannot rule out this possibility. We find 
that if the free energies of interaction with the transition 
state are spread over a wide range, unfolding occurs pre- 
dominantly with the transition state in contact with one 
or a few of the other molecules present. These molecules 
are the ones responsible for the outliers of the distribu- 
tion of interactions with the transition state, they are 
the ones that interact most strongly with the transition 



FIG. 1: Schematic representation of our starting model 
for the transition state in contact with a patch of surface. 
The surface is assumed planar for simplicity. Hydropho- 
bic monomers are shown as the dark cubes, and hydrophilic 
monomers are the light cubes. The transition state is the 
set of nji; = 7 contiguous monomers, B = 5 of which are 
hydrophobic, on top of the surface. 




state. If we take these outliers to be RNA molecules 
then the predictions of our model are consistent with the 
experimental findings of Supattapone and coworkers [|J. 
When one or a few outliers dominate the rate, it may 
vary significantly from species to species simply due to 
chance species-to-species variations in the nucleotide or 
amino-acid sequences of these outliers. 

Supattapone and coworkers 6] have shown that the 
conversion of a prion protein from the PrP c form to 
the PrPres form is greatly accelerated by a specific RNA 
molecule or by a small set of such molecules. The PrP c 
form is the normal form while the PrPres form is anal- 
ogous to the form associated with disease. The PrPres 
form is so-called because it is Protease RESistant, i.e., 
not destroyed by the proteases that cut the chains of 
normal proteins. The two forms of the protein have the 
same amino-acid sequence, they differ only in conforma- 
tion. The interconversion is known to be accelerated by 
PrPres itself but Supattapone and coworkers showed that 
a specific fraction of RNA molecules from both hamsters 
and mice but not the same fraction from invertebrates, 
also appeared to accelerate the conversion of the same 
protein. Of course, in terms of the prion diseases in 
different species the prion protein itself will vary from 
species to species and this will cause variability. Here we 
are considering variability not in the prion protein itself 
but in a cofactor that interacts with the prion protein. 
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There is other experimental data on possible cofactors 
affecting the rate of prion protein conversion. Cordeiro 
et al. L 7J suggest, on the basis of experimental evidence, 
that DNA reduces the free energy barrier to the conver- 
sion of a prion protein into the form associated with the 
disease. Other work on prions has implicated as a cof ac- 
tor not RNA but a protein dubbed protein X There 
is considerable uncertainty surrounding the mechanism 
behind prion diseases @. See the reviews of Harris Q 
and of Aguzzi and Polmenidou Q for an introduction to 
prions. 

We assume the unfolding of a protein to be a simple 
activated process 0,0), its rate having an exponential 
dependence on the barrier to unfolding, AF*: the dif- 
ference in free energy between the folded protein and the 
transition state. The transition state being, by definition, 
the state of the protein along the unfolding pathway that 
has the highest free energy. Our model of the transition 
state for unfolding is a linear polymer on a simple cubic 
lattice, n m monomers long. Inside a living cell, there are 
the surfaces of proteins, of membranes, of DNA etc.. For 
simplicity we lump all these surfaces together into a large 
flat surface which we model by a plane of lattice sites. A 
transition state in contact with a part of this surface is 
shown in fig. ^ The monomers of the transition state 
and of the surface are either hydrophilic or hydropho- 
bic. We take B of the monomers of the transition state 
to be hydrophobic. We assume that unfolding proceeds 
by some part of the protein, n m monomers long, unfold- 
ing, its free energy increasing as it does so until the free 
energy reaches a maximum at the transition state 0. 
This transition state can contact the surface, as seen in 
fig. 2] and for each hydrophobic monomer of the transi- 
tion state in contact with a hydrophobic monomer of the 
surface there is a contribution of — e to the free energy 
of the transition state. The only energy of interaction is 
between hydrophobic monomers. 

The surfaces are those of proteins, RNA., etc. and so 
are coded for by the genome of the organism. Thus they 
will differ between one species and another. We have 
no means of calculating them from the genome of an 
organism and so resort to modelling the surface with 
a purely random distribution of hydrophobic and hy- 
drophilic monomers. Each monomer is hydrophobic with 
probability h and hydrophilic with probability 1— h. This 
is in the spirit pioneered by Wigner and others [l0| in 
random matrix theory, see ref. [llj for an application to 
protein mixtures. The surface provides N s different posi- 
tions and configurations of the transition state in which 
the transition state can interact with the surface, we call 
these unfolding configurations. We neglect any correla- 
tions between the interaction energy at different unfold- 
ing configurations on the surface and assume that the 
N s configurations are independent. Then, if we denote 
the free energy of the transition state when it is not in- 
teracting with any other monomers by AF£ , the rate of 



unfolding at configuration i, Ri, is 

Ri = v exp (-AF * + me) , (1) 

where m is the number of hydrophobic monomers of the 
transition state that are adjacent to hydrophobic parts 
of the surface. Thus, the surfaces present are specified 
by the set of N s values of the random variables m . Note 
that we have assumed that the attempt frequency v is 
the same for all unfolding configurations, only the free- 
energy barrier varies. We use units such that the thermal 
energy k B T = 1. 

The rate of unfolding averaged over all N s possible 
configurations is 

R = N- 1 J2 R i- ( 2 ) 

i=l 

Although we have used the specific example of protein 
unfolding, quite generally the rates of activated process 
are given by equations with the form of eq. JQ) and so our 
theory will apply quite generally to activated processes in 
vivo. Equations similar to eqs. and (2J were employed 
by Karpov and Oxtoby (l2j to study nucleation, an ac- 
tivated process like unfolding, in the presence of random 
static disorder. The author has also applied the approach 
used here to nucleation and this reference may be 
consulted for more details of the analysis performed be- 
low. The analysis required for nucleation is very similar 
to that required for our model of unfolding. 

Different organisms have different genomes and so dif- 
ferent sets of proteins etc., inside their cells. Supattapone 
and coworkers Q found that RNA molecules from mam- 
mals accelerated the protein conformational conversion 
whereas RNA from invertebrates did not. Thus we would 
like to model and try to understand species-to-species 
variability. To do so we simply assume that the surfaces 
in different species are uncorrelated, then two species 
are modelled by two uncorrelated realisations of the sur- 
face. Of course, the surfaces present in closely related 
species in particular will be correlated due to their sim- 
ilar genomes, but we will leave the introduction of such 
correlations to future work. 

Continuing, as only hydrophobic monomers interact, 
rii is a sum of B independent random variables that are 
1 with probability h and with probability 1 — h. So, 
the probability distribution function of m, p(rii), is 

m\(B - my. 

exp [-{m - mf / (2w 2 )} 

where we have indicated that p(n.i) is approximately a 
Gaussian for large B and m. m = Bh is the mean, and 
the variance w 2 = Bh(l — h). From now on we will 
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neglect any deviations from the simple Gaussian distri- 
bution function of eq. © and the discrete nature of m 
and use this equation for p(rii). 

Having chosen to model different species by uncorre- 
cted realisations, we will examine fluctuations of the rate 
R between different realisations. We assume this varia- 
tion between realisations is a reasonable model for varia- 
tions between species. Returning to eq. (J2J) for the rate, 
using eq. we obtain 

N, 

R = JVT^exp (—AFq) exp (me) , (4) 

i 

where the m are taken to be random variables drawn 
from the Gaussian distribution eq. ©. Except for con- 
stant factors, the rate R is equivalent to the partition 
function of the Random Energy Model (REM) of Der- 
rida 0, 0. The REM is a simple and well studied 
model of glasses and other disordered systems. 

Just as the average partition function of the REM can 
be obtained, we can obtain the average of the rate R, 

(R) = iV7Wp(-AF *)(gexpK e )} (5) 

z=l 

= ^exp(-Af *)exp(me + e 2 w; 2 /2) • (6) 

This is the average of R over many different realisations 
of the surface, i.e., many different sets of the N s random 
variables m that define a surface. As R is a sum over 
random variables it itself is a random variable. For the 
large values of N s considered here, the rate R is either 
self-averaging or non-self-averaging. It is self-averaging 
if for almost all realisations the rate of unfolding R is 
close to (R), i.e., if R is almost the same for almost all 
realisations. Then the right-hand side of eq. ijfjj will be 
a good approximation to the rate R of any realisation. 
If it is non-self-averaging then the rate R differs appre- 
ciably from one realisation to another, the values of R 
have a large spread and eq. © is unlikely to provide a 
good approximation to the value of R for a randomly se- 
lected realisation. R is non-self-averaging if and only if 
the sum of eq. Q is dominated by one or a few terms: 
the variation comes from variation in the values of the 
largest terms in the sum. This is just as in the REM, see 
ref. Q for details. 

Recall that we are assuming that a realisation corre- 
sponds to a species. Thus, if R is self-averaging, then our 
model predicts that the rate of unfolding of a particular 
protein is almost the same in all or almost all species, 
whereas if it is not self-averaging then the rate of unfold- 
ing of a specific protein will vary significantly from one 
species to another. 

We will now determine the boundary where the rate 
R crosses over from self-averaging to non-self-averaging. 
From eq. (@J we see that the rate R is dominated by un- 
folding configurations with values of m where the prod- 
uct of the number of configurations and exp(riie), is a 



maximum. The number of configurations is simply pro- 
portional to the probability of eq. The maximum of 
the product p(m) exp(rije) is at n max = m + ew 2 . Now, 
the average number of configurations around this value of 
rii is just N s p(n max ), and because this average is a sum 
over independent random variables (the rii) the ratio of 
the fluctuations to the mean scales as [JV s p(fi mal )] _1 / 2 . 
Thus the fluctuations in the number of configurations 
that contribute the dominant amount to the rate, and 
hence the fluctuations in the rate itself are small relative 
to the mean if and only if N s p(n max ) 3> I. This is true 
whenever 21n7V s — e 2 w 2 > 0. 

Thus, the boundary between self-averaging and non- 
self-averaging regimes is given by the equation 

2 In N s - e 2 w 2 = 0. (7) 

Note that e 2 w 2 is the variance of the distribution of in- 
teraction energies between the transition state and the 
surface. Thus the rate is self-averaging if and only if the 
logarithm of the number of possible configurations that 
the transition state can unfold in, is larger than half the 
variance of the interaction energy. This is the main result 
of this work. It is a very general result — it applies gen- 
erally to activated processes in a random or near-random 
environment. Our conclusions here apply to any process 
with a rate given by an equation of the form of eq. J5J, 
not just to protein unfolding in vivo. See ref. [lflj for an 
application to nucleation at first-order phase transitions. 

In the non-self-averaging regime, a single unfolding 
configuration can be responsible for a significant fraction 
of the entire rate of unfolding at the surface. This con- 
figuration must of course be the configuration with the 
largest value of rii. We denote this largest value by x. 
If we define the probability distribution function, p ev (x), 
of x, then the fraction of the rate R that is due to this 
extreme value is, 

vexp(— AFq) f . . , . , . . 
= N (R) J Pev ^ x > exp ^ ex > ( ' 

We can simplify eq. (JSJ by introducing the reduced vari- 
able y = (x — m)/w. Then, from eq. (jSJ and using eq. 
© for (R), we obtain 

feu = N^exp (-(ew) 2 /2) J dyexp (ewy)p ev (y), (9) 

where p ev (y) is the probability distribution function for 
the maximum value of a set of N s values taken from a 
Gaussian of zero mean and unit standard deviation. Note 
that although the absolute value of the rate R and of the 
contribution of the extreme value both depend on the 
mean m, f ev does not. It depends only on the product 
ew, and on N s . 

The determination of v^v (y) is a standard problem in 
extreme- value statistics [l6j . We start from the fact that 
the probability that the largest of N s values is y is the 
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FIG. 2: The mean fraction, f ev , of the rate R that is due 
to the configuration with the largest n„ as a function of the 
product of the width of the Gaussian, w, and the interac- 
tion energy e. The solid, dashed and dotted curves are for 
N s =1,000, 10,000 and 100,000 configurations, respectively. 




probability that 1 of the N s configurations has a value y, 
and all the remaining N s — 1 configurations have smaller 
values, multiplied by N s , as any one of the N s configura- 
tions can have the largest value. Thus, 

Pev (y) = N s p(y)p^- 1 (y), (10) 

where p(y) is a normalised Gaussian of zero mean and 
unit standard deviation, and p<(y) (p>(y)) is the proba- 
bility of obtaining a number less (greater) than y from a 
Gaussian of zero mean and unit standard deviation. We 
are interested in the region where x is several standard 
deviations above the mean, i/ > 1. Now, p < = 1 — p > , 
and so for j/ > 1, p> < 1, and we can rewrite eq. (|ll)fl as 

p ev (y) ~ N s p(y)ex.p[-N sP> (y)] . (11) 

Also, p>{y) = (l/2)erfc(j//2 1 / 2 ), which for y > 1 simpli- 
fies to P> (y) ~ exp (-y 2 ) /[{2n) 1 ' 2 y]. 

In fig. |21 we have plotted the fraction of the rate due to 
the configuration with the largest interaction energy, and 
so the lowest barrier, f ev , as a function of ew. We took 
N s =1,000, 10,000 and 100,000. Assuming that there are 
a few thousand different species inside a cell and that 
each can potentially interact with the transition state in 
a few ways, we end up with the estimate N s w 10 4 . The 
other parameter is ew. The interaction strength of a pair 
of monomers is expected to lie in the range 1 to 3 (recall 
that e is in units of UbT). If the fraction of hydrophobic 
monomers h w 1/2, then for B w 5 to 15 hydrophobic 
monomers, we have that w m 1 to 2. Combining these 
values for e and w we have that ew 0.5 to 6. Return- 
ing to fig. |21 we see that as ew increases, so does f ev . 
For N s =10,000, eq. is satisfied for ew = 4.29. For ew 
around this value the configuration with the largest inter- 
action energy already contributes a large amount to the 



total rate, on average. This large contribution will vary 
significantly from one realisation to the next, from one 
species to the next. So, the rate of unfolding of the pro- 
tein will vary significantly from one species to the next, 
depending on whether the species has some part of a 
protein, RNA molecule etc., that binds to the transition 
state unusually strongly. Our estimate for the possible 
values of ew in vivo goes up to around 6, so we esti- 
mate that the variation in the interaction free energies 
with a transition state may be large enough to cause 
random species-to-species variation. The RNA molecule 
or molecules found to catalyse the conversion is within 
our model the origin of one of the configurations that are 
outliers of the distribution, that interact most strongly 
with the transition state. Of course if ew is small then 
the rate R has significant contributions from many un- 
folding configurations and so varies weakly from species 
to species, essentially due to variations in the rate being 
averaged out in accordance with the central-limit theo- 
rem. 

In conclusion, Supattapone and coworkers [|J have 
found that cell extracts of some species but not others ac- 
celerate the conversion of the prion protein to a protease- 
resistant form. This conformational change must involve 
partial unfolding. Protein unfolding in vivo or in a cell 
extract occurs in a very complex and heterogeneous en- 
vironment. There are a huge number of species present 
that potentially could interact with and stabilise the 
transition state of unfolding. A single strongly stabilising 
interaction could dramatically increase the rate of un- 
folding. Here we have suggested a possible model for the 
species-to-species variation in the ability of cell extracts 
to accelerate prion protein conversion jfj. The model is 
a statistical one: interactions are modelled by random 
variables and different species by different uncorrelated 
realisations of the random interactions. We suggest that 
the acceleration is due to a strong interaction of the tran- 
sition state for prion protein conversion with one or a 
few species of RNA molecules, and that this interaction 
is strong simply by chance. It is simply accidental that 
they reduce the free-energy barrier to unfolding. Prov- 
ing this suggestion would require identifying the RNA 
molecule or molecules that interact with the prion pro- 
tein and then demonstrating that there is no functional 
relationship between the protein and the RNA. Falsifying 
the suggestion is perhaps more straightforward, it only 
requires finding a functional relationship. The species-to- 
species variation then simply comes from the variation in 
the nucleotide sequences of RNA molecules from species 
to species. The RNA molecules that perform the same 
function in say mice and fruit flies, will have similar but 
not identical nucleotide sequences and so will have dif- 
ferent interaction free energies with the transition state. 
Finally, it should be noted that it is also possible that the 
RNA molecule or molecules have evolved to interact with 
the prion protein, although we know of no evidence that 
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they are under selection pressure to interact specifically 
with the transition state. 
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